Abstract

I describe the effects that certain factors play over

Introduction

This data set contains information about loans. Has over 113000+ observations and 81 variables. Variables include data about the loan, borrower, lenders and investors. Studying this data should help understand factors that have an effect with loan agreements. For my personal benefit, I hope to better understand the factors that could help me better obtain a comfortable mortage and pay it off to finally own a house after wishing it for years.

Slight adjustment to the data sets

Started by removing ambigous employment statuses from the data set also removed outliers which made the charts very difficult to read. In addition, created a data frame with means and medians to better compare this information.

Univariate Analysis

## loans2$IncomeRange: $0
## NULL
## -------------------------------------------------------- 
## loans2$IncomeRange: $1-24,999
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##     0.083  1166.667  1583.333  1427.846  1833.333 10000.000 
## -------------------------------------------------------- 
## loans2$IncomeRange: $100,000+
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##     0.083  9083.333 10166.667 11055.064 12500.000 20825.000 
## -------------------------------------------------------- 
## loans2$IncomeRange: $25,000-49,999
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    0.083 2666.667 3166.667 3139.743 3593.458 9500.000 
## -------------------------------------------------------- 
## loans2$IncomeRange: $50,000-74,999
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    0.083 4500.000 5000.000 5025.569 5500.000 9688.917 
## -------------------------------------------------------- 
## loans2$IncomeRange: $75,000-99,999
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4141    6583    7000    7056    7500   13333 
## -------------------------------------------------------- 
## loans2$IncomeRange: Not displayed
## NULL
## -------------------------------------------------------- 
## loans2$IncomeRange: Not employed
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    0.083    0.083  856.000 1291.281 1507.000 9096.000

Bivariate Analysis

## 
##  Pearson's product-moment correlation
## 
## data:  loans2$BorrowerAPR and loans2$LenderYield
## t = 2322, df = 97711, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9909477 0.9911709
## sample estimates:
##     cor 
## 0.99106

## 
##  Pearson's product-moment correlation
## 
## data:  loans2$OpenCreditLines and loans2$StatedMonthlyIncome
## t = 91.229, df = 97711, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2743744 0.2859303
## sample estimates:
##       cor 
## 0.2801625

Multivariate Analysis

Final Plots and Summary

The most interesting observations I found among this data, was not the relations between variables, but instead the lack of relation between variables where I was expecting the opposite. One thing I was curious about was seeing which credit score were defaulting the most, before plotting the chart, I pictured lower credit scores defaulting the most, however, since most of the loans are given to borrowers with scores around the 700 vecinity, these scores were also the ones reporting most default loans likely due to the portion of the borrowers they represent.

Default Distribution by Credit Score

Income by Credit Score Where Laons Defaulted

## 
##  Pearson's product-moment correlation
## 
## data:  loans2$CreditScoreRangeLower and loans2$StatedMonthlyIncome
## t = 63.256, df = 97711, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1923109 0.2043578
## sample estimates:
##       cor 
## 0.1983418

Income by Credit Score Where Laons Defaulted

One thing I was very confident about was the fact that people with higher bankcard utlization would be more likely fail to pay on time. But my assumption was wrong again. In fact, this is one of the weakest relationships I explored, almost 0.

## 
##  Pearson's product-moment correlation
## 
## data:  loans2$BankcardUtilization and loans2$CurrentDelinquencies
## t = -21.313, df = 97711, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.07426319 -0.06178106
## sample estimates:
##         cor 
## -0.06802478

Reflection

When I moved into this country, I had to build a credit score, given the fact that I had none. It was a struggle, because no financial institution would even let me open a credit card with them and how are you supposed to build a credit score if no one gives you the opportunity to build credit? Well, you start with a secured credit card, which essentially is paying the bank fees and interest for you to borrow from your own money, but these payments are reported to credit bureaus and that’s how you start buidling credit history. I knew that I would give my 100% to the bank to pay them back, but me telling them did not mean anything, because they really didn’t know anything about me. Data speaks for itself.

Banks have performed these type of analysis thousands and thousands of times, likely with much more depth than this, so when they ask you about your credit score, your current debt, income and more, it is for a reason. Although you may think, these factors do not apply to you, because you know you will pay back, the bank has no way of measuring your ability to pay by just listening to you say so. In a greater sense, these factors are acuarate and help minimize the losses to both borrowers and lenders.

From a technical perspective, most of the challenges I faced while building these plots came from understanding or finding a plot that depicts data that makes sense. I had the variables and knew what I was looking to see, however building was hard, I either had line graphs with way too much noise, irrational bar graphs or scatter plots with lines of dots that made no sense.

To further improve this project I would like to add pie charts. CurrentlyI can build simple ones, but was not able to build one with the data from this set, as I lacked the knowledge to do so, even after hours of researching online how to possibly do this. I would also like to build better line graphs that actual have a continuos X axis variable.